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ABSTRACT 

We use simulations to demonstrate that photometric redshift "errors" can be greatly reduced by 
using the photometric redshift probability distribution p(z) rather than a one-point estimate such as 
the most likely redshift. In principle this involves tracking a large array of numbers rather than a 
single number for each galaxy. We introduce a very simple estimator that requires tracking only a 
single number for each galaxy, while retaining the systematic-error-reducing properties of using the 
full p{z) and requiring only very minor modifications to existing photometric redshift codes. We find 
that using this redshift estimator (or using the full p(z)) can substantially reduce systematics in dark 
energy parameter estimation from weak lensing, at no cost to the survey. 
Subject headings: surveys — galaxies: photometry — methods: statistical 



1. INTRODUCTION 

Photometric redshifts (Connolly et al. 1995, Hogg et 
al. 1998, Benitez 2000) are a key component of galaxy 
surveys. As surveys get larger, reducing statistical uncer- 
tainties, systematic errors become more important. Sys- 
tematic errors in photometric redshifts are therefore a 
top concern for future large galaxy surveys, for example 
as highlighted by the Dark Energy Task Force (Albrecht 
et al. 2006). 

Much of the concern has centered on "catastrophic out- 
liers" which are galaxies for which the photometric red- 
shift is very wrong, for example when mistaking the Ly- 
man break at z ~ 3 for the 4000A break at very low red- 
shift. Even a small fraction of outliers can significantly 
impact the downstream science, and modeling this im- 
pact requires going beyond simple Gaussian models of 
photometric redshift errors. 

In many cases, however, outliers are "catastrophic" 
only because they have a multimodal redshift proba- 
bility p(z), which cannot be accurately represented by 
a single number such as the most probable redshift. 
Fernandez-Soto et al. (2002) showed that after defining 
confidence intervals around the p{z) peaks, 95% of galax- 
ies in their sample had spectroscopic redshifts within the 
95% confidence interval and 99% had spectroscopic red- 
shifts within the 99% confidence interval. Yet the same 
data appear to contain catastrophic outliers on a plot 
where each galaxy is represented only by a point with 
symmetric errorbars. 

There is a second motivation for using the full p{z). 
The redshift ambiguities described above are due to 
color-space degeneracies. But even without these degen- 
eracies, photometric redshift errors should be asymmet- 
ric about the most probable redshift due to the nonlin- 
ear mapping of redshift into color space. Avoiding bi- 
ases from this effect also requires reference to p(z). In- 
deed, Mandelbaum et al. (2008) showed that using p(z) 
in Sloan Digital Sky Survey (SDSS) data (which are not 
deep enough to suffer serious degeneracies) substantially 
reduced systematic calibration errors for galaxy-galaxy 
weak lensing. 

In this paper we demonstrate, using simple simula- 



tions, the reduction in systematic error that can result 
from using the full p{z) in a deep survey with significant 
degeneracies. We also introduce a simple way to reduce 
the computational cost of doing so. 

2. SIMULATIONS 

We conducted simulations similar to those in Mar- 
goniner & Wittman (2008) and Wittman et al. (2007), 
which used the Bayesian Photometric Redshift (BPZ, 
Benitez 2000) code, including its set of six template 
galaxy spectral energy distributions (SEDs) and its set 
of priors on the joint magnitude-SED type-redshift dis- 
tribution. We started with an actual R band catalog 
from the Deep Lens Survey (DLS, Wittman et al. 2002). 
For each galaxy, we used the R magnitude to generate a 
mock type and redshift according to the priors, and then 
generated synthetic colors in the BVRz' filter set used by 
DLS. (The filter set is not central to the argument here, 
but one must be used for concreteness.) We then added 
photometry noise and zero-point errors representative of 
the DLS. The color distributions in the resulting mock 
catalog were similar to those of the actual catalog, in- 
dicating that the mock catalog is consistent with a real 
galaxy survey. 

We then ran the mock catalog of 83,000 galaxies 
through BPZ, saving the full p(z). In a post-processing 
stage, we can extract from the full p{z) not only the most 
probable redshift (which had already been determined by 
BPZ and labeled zb), but other candidate one-point es- 
timates such as the mean and median of p(z), as well as 
the summed p(z) for any desired set of galaxies. 

3. RESULTS 

We first show one of the traditional one-point estimates 
to more clearly illustrate the problem. The left panel of 
Figure [T] shows the most probable redshift zb vs. true 
redshift z. To accurately render both the high- and low- 
density parts of this plot, we show it as a colormap rather 
than a scatterplot. The core is rather tight, requiring a 
logarithmic mapping between color and density to bring 
out the more subtle features in the wings. With this 
mapping, the systematics are clear: a tendency to put 
galaxies truly at z ~ 2 — 3 at very low redshift; a ten- 
dency to put galaxies truly at low redshift at zb ~ 1.4—2; 
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Fig. 1. — Top: each galaxy's most probable photometric red- 
shift (23) vs. spectroscopic redshift. Bottom: Monte Carlo sam- 
pling from each galaxy's redshift probability distribution (zmc) 
vs. spectroscopic redshift. Sampling the distribution cleans up 
many artifacts introduced by considering only the most probable 
photometric redshift. 



and asymmetric horizontal smearing in several different 
zb intervals, e.g. at zg ~ 1.5. The specifics of the fea- 
tures depend on the filter set, but their general appear- 
ance is typical (but note that they will be difficult to see 
in plots based on spectroscopic followup of deep imag- 
ing surveys, as the brighter, spectroscopically accessible, 
galaxies form a much tighter relation). Corresponding 
plots are not shown for other one-point estimates such 
as the mean and median of p(z), but they have similar 
to worse systematic deviations. 

Which systematics are most important depends on the 
application. In this paper we consider two-point correla- 
tions of weak gravitational lensing (cosmic shear) , which 
require that the photometric redshift distribution of a 
sample of galaxies be as close as possible to the true red- 
shift distribution; errors on specific galaxies are not im- 
portant. Also, accurate knowledge of the scatter is much 
more important than minimizing the amount of scatter. 
Thus, Figure[l]would ideally be symmetric about a line of 
unity slope, reflecting the fact that the photometric and 
true redshift distributions are identical. 1 This is clearly 
not the case for the top panel of Figure [T] 

Ideally p{z) contains the required information lacking 
in the single number zb, but it is also more difficult to 

1 In practice, the two distributions would be the same for any 
subsample, not only for the entire sample. We address this later, 



in Figure |3] 



Fig. 2. — Construction of the zmc estimator for an example 
galaxy. The bottom panel shows p(z) as usually plotted. The top 
panel shows the cumulative p(z), that is, the probability that the 
galaxy lies at redshift less than the value on the abscissa. A random 
number in the range 0-1 is drawn, in this case 0.32, and the redshift 
at which the cumulative p(z) has a value of 0.32 is recorded as the 
Monte Carlo estimate zmc- 



work with, requiring the storage and manipulation of 
an array of numbers for each galaxy. We simplify the 
computational bookkeeping by defining a single number 
which is by construction representative of the full p(z). 
This estimate is simply a random number distributed 
according to the probability distribution p{z) and is de- 
noted by Zmc because it is a Monte Carlo sample of 
the full p{z). Specifically, for each galaxy, a random 
number x is drawn uniformly from the interval [0,1), 
and the Monte Carlo redshift zmc is defined such that 
Jq MC p(z')dz' = x. Figure [5] illustrates the process. The 
bottom panel shows p{z) as usually plotted, while the top 
panel shows the cumulative p(z), that is, the probability 
that the galaxy lies at redshift less than the value on the 
abscissa. A random number in the range 0-1 is drawn, 
in this case 0.32, and the redshift at which the cumula- 
tive p(z) has a value of 0.32 (dotted line) is recorded as 
zmc 1 in this case 0.47. This results in a single number 
for each galaxy, which remains unbiased even if p(z) is 
multimodal and/or asymmetric. Of course, some preci- 
sion is lost in this process; it should be avoided when 
studying a small number of galaxies in great detail, but 
for large samples of galaxies it must converge to the p{z) 
of the sample. Furthermore, it requires only a minor 
modification to most photometric redshift codes. 

The bottom panel of Figure T] shows zmc vs. Zb- 
Clearly, the systematics are vastly improved. Even with 
the logarithmic scaling, it is difficult to see departures 
from symmetry about a line of unity slope. We therefore 
compare one-dimensional histograms in what follows. A 
typical use of photometric redshifts in a galaxy survey 
will be to bin the galaxies by redshift, for example to 
compute shear correlations in redshift shells. Because 
the true redshifts will not be known, the galaxies must 
be binned by some photometric redshift criterion. For 
simplicity, we choose zb- 

Figure p] (upper panel) shows true and inferred red- 
shift distributions of galaxies in four zb bins: 0-0. f, 0.4- 
0.5, 0.9-1.1, and 1.4-1.6. The true distributions is shown 
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Fig. 3. — Black: true redshift distributions of photometric red- 
shift cuts zg < 0.1, 0.4-0.5, 0.9-1.1, and 1.4-1.6 in the simula- 
tion. Red: distributions of the %mO estimator for the same sets 
of galaxies. Blue: summed p(z) for the same sets of galaxies. The 
simple estimator zmc does as well as tracking the full p(z). Using 
each galaxy's most probable redshift zg to infer the redshift dis- 
tributions would result in four vertical-sided bins. Inset: a high- 
redshift bump in the true redshift distribution of galaxies with 
0.4 < zg < 0.5 is captured by the summed p(z) or by zmCi but 
would be considered "catastrophic outliers" if zg were used. 

in black, the distribution inferred from zmc is shown 
in red, and the distribution inferred from summing the 
galaxies' p{z) is shown in blue. The asymmetry and the 
wings of the true redshift distribution are well captured 
by zmc or by summing p(z). By comparison, using each 
galaxy's most probable redshift zb to infer the redshift 
distributions would have resulted in four vertical-sided 
bins, which would become roughly Gaussian after con- 
volving with the typical galaxy's zb uncertainty. It is 
clear that this would not capture the true redshift distri- 
bution nearly as well as the p(z) method does. For ex- 
ample, the inset in Figure [3] shows a small high-redshift 
(z ~ 2.5) bump in the 0.4-0.5 bin which is captured by 
zmc OT by summing p(z). Looking only at the most 
probable redshift zb would result in these galaxies being 
considered catastrophic outliers. 

Another way to reduce "catastrophic outliers" and re- 
lated systematics might be to discard galaxies whose p(z) 
is multimodal, not sharply peaked, or otherwise fails 
some test. This may be effective, but it greatly reduces 
the number of galaxies available to work with. The zmc 
and full p(z) methods accurately reflect the true redshift 
distributions without requiring any reduction in galaxy 
sample size. This is important for applications such as 
lcnsing, for which galaxy shot noise will always be an 
issue. 

Having demonstrated that using p(z) (whether by sam- 
pling or by using the full distribution) greatly reduces 
photometric redshift systematics, two questions natu- 
rally arise. How much better is it in terms of a science- 
based metric? And what are the remaining errors or 
limitations? Because Zmc an d the full p{z) give very 
similar results, references to using p{z) in the remain- 
der of the paper should be understood to include either 
implementation. 
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Fig. 4. — Redshift bias (inferred minus true mean redshift) for 
the four bins shown in Fig. p3| when using the full p(z) (solid) and 
when using the most probable redshift zg for each galaxy (dotted). 
The results for zmc are nearly indistinguishable from those for the 
full p(z) and are not shown. Top panel: BVRz' filter set and DLS 
noise model. Bottom panel: UBRIJHK filter set and optimistic 
noise model with photometric signal-to-noise of 10 in each filter, 
regardless of redshift or magnitude. 

4. EFFECT ON DARK ENERGY PARAMETER 
ESTIMATION 

For each of the four zb bins shown in Fig. [3] we com- 
pute the redshift bias (inferred minus true mean redshift) 
for the full p{z) and zb approaches. These are shown in 
Fig. H as the solid and dotted lines respectively (the re- 
sults for zmc & re nearly indistinguishable from those for 
the full p(z) and are not shown) . For the DLS filter set 
and noise model, the bias is within a few hundredths 
of a unit redshift when using p(z), but only within sev- 
eral tenths when using the most probable redshift Zb- 
For lensing applications, the zb bias at low redshift is 
exaggerated somewhat, because it results from a small 
outlying bump at high redshift, as shown in the inset of 
Fig. [3j In a real survey, these high-redshift interlopers 
woula be smaller and fainter than the galaxies that re- 
ally belong in the low-redshift bin, and therefore have 
less precise shape measurements and smaller weight on 
the shear statistics. We account for this effect by as- 
signing a lensing weight to each mock galaxy based on 
its magnitude and drawn from the actual distribution of 
weights as a function of magnitude in the DLS. This re- 
duces the bias of the lowest-redshift bin to 0.02 for p{z) 
and 0.24 for zb, and has a progressively smaller effect on 
highcr-redshift bins. 

Fig. 7 of Ma et al. (2006) shows the degradation in 
dark energy parameters for a wide and very deep weak 
lensing survey, as a function of the tightness of priors that 
can be put on the redshift bias and scatter in photometric 
redshift bins. It shows that loose priors of order 0.2 result 
in 80-85% degradation in wq estimates (with respect to 
a survey with absolutely no redshift errors). If, on the 
other hand, one need only allow for a ~ 0.02 bias as 
with the p(z) approach, the degradation decreases to 50- 
60%. For estimating w a , the degradation decreases from 
a factor of six to a factor of about 2.5 by employing p(z). 

These are only rough estimates, for a number of rea- 
sons. Future surveys as deep as those contemplated 
by Ma et al. (2006) will use more extensive filter sets, 
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which will probably improve the zb performance some- 
what with respect to the p(z) performance. And nearer- 
term, shallower surveys have looser redshift requirements 
because their shear measurements are not as precise. But 
it is clear that using p(z) greatly improves the survey at 
essentially no cost. We also conducted simulations using 
a more extensive filter set, to check the generality 

5. REMAINING ERRORS AND LIMITS OF THIS WORK 

The simulations are simplistic in that the same six SED 
templates (and priors) used to infer p{z) are used to gen- 
erate the mock catalogs. In real photometric redshift cat- 
alogs, p(z) will be less perfect because galaxy SEDs are 
more varied and priors are imperfectly known. Smaller 
effects of the same nature include uncertainties in real- 
life filter and throughput curves, which are artificially 
reduced to zero here. However, these errors also affect 
the most probable redshift. Therefore, although the sim- 
ulated results presented here are optimistic overall (given 
this filter set), the performance of p(z) relative to using 
the most probable redshift may not be. More sophisti- 
cated simulations beyond the scope of this paper will be 
required to determine the limits of p(z) accuracy for any 
given filter set and survey depth. 

The remaining redshift bias is not trivial, 0.01-0.02. 
This is an order of magnitude larger than required to 
keep the Wq degradation within 10% of an ideal survey 
(for the deepest surveys; requirements are less stringent 
for shallower surveys). However, a few factors are ac- 
tually pessimistic here compared to future large surveys: 
the limited filter set, and relatively large zeropoint errors 
(~ 0.03 mag here vs 0.01 mag for SDSS and future large 
surveys). A more extensive filter set will improve both 
the p{z) and the zb etimates, but will probably improve 
the zb estimate more as it eliminates some degenera- 
cies. Again, survey-specific simulations will be required 
to make more specific conclusions. 

Finally, there is a source of bias not simulated here: 



Eddington (1913) bias. The type and redshift priors are 
based on magnitude, but at the faint end magnitudes 
are biased due to the asymmetry between the large num- 
ber of faint galaxies that noise can scatter to brighter 
magnitudes, versus the smaller number of moderately 
bright galaxies that noise can scatter to fainter magni- 
tudes. Surveys wishing to derive photometric redshifts 
for galaxies detected at, say, 10cr or fainter, must use the 
Hogg & Turner (1998) prescription for removing Edding- 
ton bias from each galaxy's flux measurements if they are 
to avoid nontrivial systematic errors. 



6. SUMMARY 

We have shown that using the photometric redshift 
probability distribution p{z) greatly reduces photometric 
redshift systematic errors, as compared to using a simple 
one-point estimate such as the most probable redshift or 
the mean or median of p(z). Various authors have made 
similar points previously, particularly Fernandez-Soto et 
al. (2002), who wrote that "this information [p(z)] can 
and must be used in the calculation of any observable 
quantity that makes use of the redshift." However, adop- 
tion of this practice has been slow to nonexistent, even 
among authors who are aware of the point, because it 
is cumbersome to track a full p(z) for each galaxy. We 
have shown that a very simple modification to photomet- 
ric redshift codes, namely choosing a Monte Carlo sample 
from the p(z), produces a single number for each galaxy 
which greatly reduces the systematic errors compared to 
using any other one-point estimate such as the mean, 
median, or mode of p(z). In contrast to approaches 
which simply reject galaxies which could be outliers, this 
method can make use of every galaxy in a survey. We 
have shown that this method results in substantial im- 
provements in a flagship application, estimating dark en- 
ergy parameters from weak lensing, at no cost to the 
survey. 
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